perm filename ITT.5[ITT,WD] blob sn#202779 filedate 1976-02-14 generic text, type T, neo UTF8
5. Computational Complexity Theory
	The  analysis  of algorithms and theory of computational complexity seek
to establish upper and  lower  lounds  on  the  minimum  amount  of  computation
required  by  various  problems,  and to make these bounds as tight as possible.
Computational  complexity  also  establishes  coarse  equivalence   classes   of
computational problems.
	To date there are many more upper bounds on computation than  there  are
lower  bounds.   Upper  bounds are more easily established because any algorithm
which accomplishes a given function establishes an upper bound  on  the  minimum
computation  required.  Fortunately, greater emphasis is now being placed on the
search  for  lower  bounds,  and  these  are  what  is  needed  to  ensure  that
cryptanalysis is computationally infeasible.  As work progresses in this area of
computer science it may  unwittingly  provide  the  basis  for  provably  secure
cryptosystems.    Conversely, if part of the security of a cryptosystem is based
on the assumed difficulty of a widely studied problem,  such  as  the  traveling
salesperson   problem,  it  is  likely  that  any  error  will  be  found  by  a
disinterested third party, not one of our opponents. Even if an opponent were to
find  such  a  flaw  in  our  system, the incentive to publish the solution to a
famous problem might outweigh the advantage of being able to read our mail.   We
thus  feel  that cryptosystems which make use of generally believed lower bounds
on computation to add to their security are of interest even though  the  theory
of computational complexity is still in its infancy.
	We will first review several  results  in  computational  complexity  to
illustrate  several  concepts.   The  sorting problem is one of the few problems
which currently have tight lower bounds on computation.   And, although it  does
not  appear  to  be  useful  in  cryptography,  it illustrates several points of
interest.   The problem is, given a randomly ordered  sequence  of  n  different
numbers {X1, X2, ... Xn}, sort them into size place X1'<X2'< ... <Xn'. The first
problem in establishing the complexity of this  operation  is  to  decide  on  a
measure  for  computation.  On a computer, there will be two primary operations:
comparing two numbers in the list to see which is larger, and  then  rearranging
the  list  on  the  basis  of  these  comparisons.    While the time it takes to
rearrange the list is not negligible in practice, now for the sake of simplicity
we  will only count the number of comparison operations required in defining the
complexity of the sorting problem.
	As  is well known, each comparison yields at most one bit of information
and there are log2(n!) ~ nlog2n bits of uncertainty comcerning the  initial  the
initial  ordering.   Therefore the complexity K of the sorting problem obeys K ≥
log2(n!) = n log2n.
	We  have brushed over an important question.  Does complexity refer to a
worst case, average, or some other situation?  Using the source  coding  theorem
[]  we  can show that () can be interpreted as bound on both the worst case and,
somewhat more strongly, on the average computation (number of comparisons).  But
even  average  computation  is  too  weak  for  cryptographic  purposes.      If
cryptanalysis requires 10↑20 operations on the  average,  but  this  is  due  to
10↑-10   of  the  cryptograms  requiring  10↑30  operations  and  the  remaining
cryptograms requiring no time to break, we have a very weak cipher.   Rather  we
need  results  more  closely  related to the idea of "typicality" in information
theory.
	Fortunately,  use  of the Kraft inequality [] allows us to show that for
any  sorting  algorithm  nα,  the  fraction  of  orderings  requiring  at   most
(log2(n!))-α  comparisons,  obeys  the  bound  nα≤2↑-α.     For example, if n=64
log2(n!)=296 bits and the fraction of orderings  requiring  280  comparisons  or
less is 1.5*10↑-5. The fraction requiring 200 comparisons or less is 1.3*10↑-29.
	Having derived a lower bound on computation, we might next wonder  about
its  tightness.       There  are  algorithms[]  which  require  only  n(log2n+1)
comparisons, even in the worst case.   However, because  they  require  a  large
number  of  memory  accesses, they are not as useful as some other algorithms []
which takes about 2nlog2n comparisons on the average and n↑2 in the worst  case.
This  points  to  a general problem:  if one type of operation is not counted in

evaluating complexity, a low complexity algorithm will often  use  too  many  of
these  operations  to be practical value.  If, for example, only multiplications
and divisions are counted, a "low complexity" algorithm will do these operations
by repeated addition or subtraction.
	In using  computational  complexity  to  certify  a  cryptosystem  these
problems are largely absent.  We do not need tight lower bounds on cryptanalytic
computation.  Rather, any sufficiently large lower bound will do.   If  we  have
not  counted  one  type  of operation, that only adds to the cryptanalyst's task
over and above our lower bound.
	However,  if  we  neglect  to  many  real costs in measuring complexity,
sufficiently large lower bounds may fail to exist.  For example,  if  memory  is
not  included  in the complexity measure, one way functions cannot exist because
table lookup can always be used  for  f↑-1(.).   Since  this  requires  enormous
precomputation, the existence of quasi oneway functions might still be proved.
	Unfortunately, most of the  currently  known  results  in  computational
complexity  are  not  applicable  to  cryptography.    For  example,  the n logn
complexity of sorting is too small to be useful.  Similarly, although evaluation
of  an nth degree polynomial is known to require n multiplies and n additions in
general[], this is also too small for cryptographic purposes.
	Let   us   therefore  jump  from  these  easily  computed  functions  to
uncomputable functions.  These are functions for which no algorithm will  always
give  the  correct answer.  For example, although it is obviously a well defined
function, it is impossible to  compute  the  function  from  the  set  of  ALGOL
programs  to  the  set {HALTS, RUNS FOREVER}, in which f(program) = HALTS if and
only if  the  program  eventually  reaches  the  STOP  statement  [].    We  are
considering  programs  which  include  all input data as part of the program and
which can address an unlimited memory.
	One  cannot merely let the program run until it stops to get the answer.
If after a trillion instructions the  program  is  still  running  what  is  our
answer?     In  fact,  Blum[]  hs  shown  that the truncated halting probllem of
deciding whether a program stops within N steps will often require  at  least  N
steps to decide.
	Unfortunately,   uncomputable   functions   are   as   inapplicable   to
cryptography  as  are  the operations such as sorting whose complexity is easily
bounded.  This is because all cryptanalytic problems belong to a class known  as
NP  (for  nondeterministic  polynomial).    As  shown  below,  an  NP problem is
computable by definition.
	A   function   is  said  to  belong  to  the  complexity  class  P  (for
deterministic polynomial) if it can be computed  by  a  deterministic  universal
computer  in a time which is always bounded above by some polynomial function of
the length of its input.   One might think of  these  as  the  class  of  easily
computed  functions,  but  it is moreaccurate to say that a function not in this
class must be hard to compute for at least some inputs.    There  are  problems,
such as the truncated halting problem, which are clearly not in the class P.
	There  is  an  interesting  class  of  problems  which  often  arise  in
engineering  and  which  cannot  be  solved  in  polynomial  time  by  any known
techniques unless they are run  on  a  computer  with  an  unlimited  degree  of
parallelism.    These problems may not belong to the class P, but clearly belong
to the class NP (for non-deterministic, polynomial) of  problems  which  can  be
solved  in  polynomial time on a "non-deterministic" computer (i.e., one with an
unlimited degree of parallelism).  Clearly the class NP inludes the class P, and
one  of the great open questions in complexity theory is whether the class NP is
strictly larger.
	Among  the problems known to be solvable in NP time, but not known to be
solvable in P time, are  the  traveling  salesperson  problem,  the  problem  of
deciding tautologies, the knapsack problem, the graph coloring problem, and many
scheduling and minimization problems [].  We see that it is not lack of interest
or  work  which  has prevented people from finding solutions in P time for these
problems.  It is thus strongly believed that at leastt  one  of  these  problems
must not be in the class P, and that therefore the class NP is strictly larger.

	Karp[] has shown that there is a subclass of NP called the  NP  complete
problems  which  are  all in P if any of one of them is.   Karp lists twenty-one
problems which are NP complete, including all of the problems mentioned above.
	As  shown in the last section, the NP complete problems show promise for
cryptographic use.   But also as shown there, the  currently  stated  nature  of
their  difficulty (i.e.,worst case) makes them not directly of use.  Further, if
we replace worst case computation time with average or typical computation  time
as  our  complexity  measure, the current proof of the equivalences among the NP
complete problems is no longer valid.  This suggests several interesting  topics
for research.
	At this point, we leave the current bounds of  computational  complexity
theory  and  turn to the problem of logs mod q, which was introduced in the last
section as a potential one-way function.  The only reason that this  problem  is
currently  outside  of  computational complexity is that no one has successfully
established a lower bound on  its  computation.     As  we  shall  see,  if  its
complexity could be established to be adequately high, it would yield a provably
secure cryptosystem.  Since the calculation of logs mod q also is of  importance
in other areas of information theory [] we suggest they deserve further study.
	The cipher would be implemented as C=P↑K, mod  q;  P=C↑D,  mod  q  where
DK=1,  mod  q-1  (since  a↑q=a);  and  P  and  C  are integers between 1 and q-1
inclusive.  For D to exist it is necessary that K be relatively  prime  to  q-1,
but  since  a  significant  fraction of integers 1≤K≤q-1 are relatively prime to
q-1, the key space is not greatly reduced.   Further since it is relatively easy
to  find  D  from  K (or to determine that K is not usable), generation of keys,
enciphering, and deciphering are all simple operations  whose  complexity  grows
like log(q).
	Cryptanalysis under a known plaintext attack, however, is equivalent  to
finding  logs  mod  q  since K=logPC, mod q.   Knowledge of a number of randomly
chosen P-C pairs cannot reduce the difficulty of cryptanalysis since  equivalent
information  can  be  obtained from a single P-C pair by raising both P and C to
the same, randomly chosen set of powers.  If C=P↑k then C'=C↑m  and  P'=P↑m  are
also a P-C pair since C'=(P')↑k.
	While we cannot show that a chosen plaintext attack also reduces to  the
hopefully  difficult  problem  of  finding logs mod q, neither have we found any
methods by which a chosen plaintext attack can benefit.  Note  the  mathematical
simplification  which results from considering a known plaintext as opposed to a
statistical attack.
	As  previously noted, Knuth [] gives an algorithm which requires O(q↑.5)
operations and O(q↑.5) words of memory.  For q=2↑1000 these requirements are  so
large  as  to  be precluded by any developments in computer technology.  This is
because quantum limitations set bounds on the minimum amount of energy necessary
to do even the simplest operation [].
	However in  studying  the  problem  of  logs  mod  q  for  cryptographic
purposes, Stephen Pohlig[] of Stanford found a technique which allows logs mod q
to be calculated in time proportional to log(q) provided q-1 has no large  prime
factors.   For example q=65537=2↑16+1 has 2 as the largest prime factor and thus
takes only 16 iterations of a moderatley simple loop.   On  the  other  hand  if
q=65543  q-1=2(32771)  is the prime factorization of q-1.   Pohlig's method then
requires on the order of 2(32771↑.5)  log2(32771)↑.5=362  words  of  memory  and
32771↑.5  log2(32771)↑.5=1358 operations.  In general, if q=2p+1 where p is also
prime then Pholig's method requires almost as much  computation  and  memory  as
Knuth's.   For logs mod q to be useful in a provably secure cryptosystem we must
show that adequately large primes of the form q=2p+1 exist, and that for  primes
of  this  form  the calculation of logs mod q takes at least q↑.5 (or some other
power of q) operations.